# EGC442 Class Notes 5/9/2023

Baback Izadi Division of Engineering Programs bai@engr.newpaltz.edu

### Final:

#### Comprehensive

- Performance problems
- ALU design
- Data Path and control
- Pipelining design and hazard
- Gache memory 2
- Virtual memory
- Parallel Computing

1. review Actes 2. redo Quiz 3. redo tests 4. Aw's

### Making a faster adder Full Adder

Let's look at a 1-bit ALU for addition:



**Carry In** 

### Problem with Ripple Carry

- Is a 32-bit ALU as fast as a 1-bit ALU?
- Is there more than one way to do addition?
  - two extremes: ripple carry and sum-of-products

Can you see the ripple? How could you get rid of it?

$$c_{1} = b_{0}c_{0} + a_{0}c_{0} + a_{0}b_{0}$$

$$c_{2} = b_{1}c_{1} + a_{1}c_{1} + a_{1}b_{1} \qquad c_{2} =$$

$$c_{3} = b_{2}c_{2} + a_{2}c_{2} + a_{2}b_{2} \qquad c_{3} =$$

$$c_{4} = b_{3}c_{3} + a_{3}c_{3} + a_{3}b_{3} \qquad c_{4} =$$

Not feasible! Why?













16 Bit Carry Look Ahead (Cont.) 12 Site delays Bit \* 8X3=24 gate delay if we use 4 bitch \* ripple 32\*2=64 32) 10 Saledoky UGBITCH = 16 bitch









## Decreasing Miss Ratio with Associativity

Associativity: Reducing cache misses by more flexible placement of blocks



#### 4-Way Associative Cache Organization



### Associative Caches

- Fully associative
  - Allow a given block to go in any cache entry
  - Requires all entries to be searched at once
  - Comparator per entry (expensive)
- n-way set associative
  - Each set contains n entries
  - Block number determines which set
    - (Block number) modulo (#Sets in cache)
  - Search all entries in a given set at once
  - n comparators (less expensive)

### Associativity Example AJALA

- Assume word size = 1 byte
  - Compare 4-block caches



11

- Direct mapped, 2-way set associative, fully associative
- Block access sequence 0, 8, 0, 6, 8
- Direct mapped

| Block   | Cache | Hit/miss | Cache content after access |   |        |   |
|---------|-------|----------|----------------------------|---|--------|---|
| address | index |          | 0                          | 1 | 2      | 3 |
| 0       | 0     | miss     | Mem[0]                     |   |        |   |
| 8       | 0     | miss     | Mem[8]                     |   |        |   |
| 0       | 0     | miss     | Mem[0]                     |   |        |   |
| 6       | 2     | miss     | Mem[0]                     |   | Mem[6] |   |
| 8       | 0     | miss     | Mem[8]                     |   | Mem[6] |   |





2-way set associative

|         |       |          | ل ا    |              |                |  |
|---------|-------|----------|--------|--------------|----------------|--|
| Block   | Cache | Hit/miss | (      | Cache conter | t after access |  |
| address | index |          | Set 0  |              | Set 1          |  |
| 0       | 0     | miss     | Mem[0] |              |                |  |
| 8       | 0     | miss     | Mem[0] | Mem[8]       |                |  |
| 0       | 0     | hit      | Mem[0] | Mem[8]       |                |  |
| 6       | 0     | miss     | Mem[0] | Mem[6]       |                |  |
| 8       | 0     | miss     | Mem[8] | Mem[6]       |                |  |

### Fully associative



| Block   |  | Hit/miss | Cache content after access |        |        |  |
|---------|--|----------|----------------------------|--------|--------|--|
| address |  |          |                            |        |        |  |
| 0       |  | miss     | Mem[0]                     |        |        |  |
| 8       |  | miss     | Mem[0]                     | Mem[8] |        |  |
| 0       |  | hit      | Mem[0]                     | Mem[8] |        |  |
| 6       |  | miss     | Mem[0]                     | Mem[8] | Mem[6] |  |
| 8       |  | hit      | Mem[0]                     | Mem[8] | Mem[6] |  |
|         |  |          |                            |        |        |  |



### Pitfall: Amdahl's Law



Example: Suppose a program runs in 100 seconds on a machine, with multiply responsible for 80 seconds of this time. How much do we have to improve the speed of multiplication if we want the program to run 4 times faster?  $5 \times Rash$ 



### Pitfall: Amdahl's Law



How much improvement in multiply performance to get 5× overall?

$$20 = \frac{80}{n} + 20$$

Can't be done!

 Improving an aspect of a computer and expecting a proportional improvement in
 Overall performance common case fast

